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Abstract — A broad set of sufficient conditions that guarantees 
the existence of the maximum entropy (maxent) distribution 
consistent with specified bounds on certain generalized moments 
is derived. Most results in the literature are either focused 
on the minimum cross-entropy distribution or apply only to 
distributions with a bounded-volume support or address only 
equality constraints. The results of this work hold for general 
moment inequality constraints for probability distributions with 
possibly unbounded support, and the technical conditions are 
explicitly on the underlying generalized moment functions. 
An analytical characterization of the maxent distribution is 
also derived using results from the theory of constrained 
optimization in infinite-dimensional normed linear spaces. 
Several auxiliary results of independent interest pertaining to 
certain properties of convex coercive functions are also presented. 

Keywords: Coercive functions, Constrained optimization, 
Convex analysis, Cross-entropy, Differential entropy, Maximum 
entropy methods. 



I. Introduction 

Consider the problem of estimating a signal from "noisy" 
observations when we have complete information about the 
statistics of the observation process but only partial prior (sta- 
tistical) information about the signal of interest. Partial prior 
information about the signal probability distribution might be 
available in the form of bounds on a restricted set of certain 
general moment measurements. Incompleteness in the prior 
information is with regard to the underlying signal probability 
distribution that is consistent with the measurements. There 
arises the question of selecting a distribution from the feasible 
ones that is noncommittal with respect to missing information. 
The maxent principle provides a selection mechanism that 
enjoys several appealing optimality properties [l]-[7]. 

Questions of existence and characterization of the maxent 
distribution in a collection of probability distributions over a 
finite-dimensional Euclidean space are, in general, problems 
in infinite dimensional constrained optimization involving sev- 
eral subtleties, and many derivations in the literature contain 
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errors 1 . Although the form of the maxent distribution subject to 
general moment equality constraints has been known for long, 
there has been little systematic investigation into its validity 
and the existence of the maxent distribution. Most results in 
the literature are either focused on the minimum cross-entropy 
distribution or apply only to distributions with a bounded- 
volume support. A key difficulty in extending such existence 
and characterization results from cross-entropy to differen- 
tial entropy is that unlike cross-entropy which is always 
well-defined, nonnegative, and satisfies a joint lower semi- 
continuity property, differential entropy is not always well- 
defined and lacks a crucial upper-semicontinuity property that 
is needed for establishing existence results along the lines of 
those for cross-entropy. 

Building upon results due to Csiszar and Tops0e [1], [9], we 
provide broad sufficient conditions on general convex families 
of distributions that guarantee the existence of the maxent 
distribution in the family. We also specialize these existence 
results to specific convex families of probability distributions 
defined through general moment inequality constraints. We 
also provide an analytical characterization of the maxent 
distribution for such general moment-constrained families. 
Our existence and characterization results hold for probability 
densities over a finite-dimensional Euclidean space, that is, 
finite-dimensional probability distributions that are absolutely 
continuous with respect to the Lebesgue measure, although 
they can be extended to general finite-dimensional sigma- 
finite measures also. For results pertaining to specific con- 
vex families of distributions defined through general moment 
inequality constraints, a finite number of constraints is as- 
sumed although the results can be extended when there are 
a countable number of constraints. Our results apply for both 
differential entropy and I-divergence although we state and 
prove results only for differential entropy. 

Existence and characterization results for a family of com- 
pactly supported probability densities on the real line with 
a prescribed mean and variance (moment equality constraints) 
are presented in [10]. The analysis in [9] is exclusively devoted 
to I-divergence (which requires a reference measure) and not 
differential entropy and the existence results were stated only 
in terms of the convexity and variational completeness of the 
feasible set of distributions. Unlike the results in [9] which 
are in terms of general conditions on the convex collections 



'See Borwein and Limber [8] for references to these nonrigorous deriva- 
tions. 
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of distributions satisfying general moment constraints with 
equality, which might be difficult to check in practice, our 
results are for general moment inequality constraints, and the 
technical conditions are explicitly on the underlying moment 
functions 2 . The results presented in [1] hold for probability 
distributions over a countable space and the existence results 
therein pertain to the center of attraction of a convex col- 
lection of distributions. The relationship between the center 
of attraction of a family of densities defined via moment 
equality constraints 3 and the maximum-likelihood estimate in 
an associated exponential family of densities is derived in [11]. 
Borwein and Limber in [8] also provide a set of sufficient 
conditions for the existence of the maxent distributions and 
characterize its form but these results differ from ours in 
several aspects. Their results are for equality constraints, ours 
are for inequality constraints. The underlying space in their 
analysis is the real line, our analysis is on M. d . Their analysis 
considered distributions with bounded support. Our analysis 
allows distributions with unbounded support. 

For a collection of distributions satisfying moment-equality 
constraints, the maxent distribution, when it exists, has an 
exponential form where the exponent belongs to the closed 
subspace spanned by the measurement functions [8]. The ad- 
ditional flexibility allowed by inequality constraints leads to a 
stronger characterization of the maxent distribution. We show, 
not surprisingly, that under moment inequality constraints and 
mild regularity assumptions, the maxent distribution has an 
exponential form where the exponent belongs to the nega- 
tive cone generated by the measurement functions. In many 
applications, inequality constraints are perhaps more com- 
monly encountered than equality constraints. With equality 
constraints, it is often difficult to verify the existence of a 
maxent solution because of possible errors in the estimated 
moments. The conditions of our existence and characterization 
theorem are application-oriented in the sense that if the 
measurement functions meet certain general requirements, the 
maxent solution exists and has a special exponential form. We 
have learnt (thanks to an anonymous reviewer) about another 
work by Csiszar which addresses inequality constraints [12]. 
However, those results are for the minimum cross-entropy 
problem and it is not clear how they could be extended 
to the maxent problem especially when the support set has 
unbounded volume - an important consideration in our work. 
Other general references where inequality constraints have 
been considered include [13, Section 13.1.4] and [14]. 

We provide two sets of sufficient conditions on the un- 
derlying constraint functions that guarantee the existence of 
the maxent distribution. In one set of sufficient conditions, 
the proof hinges on the assumption that the distributions of 
interest have supports that are contained in a finite volume 
subset of R d that need not be bounded. The second set of 
sufficient conditions removes this restriction by assuming the 
presence of a general "stabilizing" moment constraint in the 
definition of the feasible collection of distributions. We also 

2 We use the terms moment function and measurement function interchange- 
ably. 

3 The moments were with respect to a cr-finite reference measure over a 
general measurable space. 



present a rich class of "well-behaved" functions that provide 
the general "stabilizing" moment constraints guaranteeing the 
existence of the maxent distribution. Frequently encountered 
constraints such as mean quadratic energy and mean absolute 
energy are well-behaved. These well-behaved constraints have 
several interesting and intuitively appealing properties that are 
of independent interest. 

In Section |n] we provide some background, define all 
important terms, and state the maxent problem. In Section ITlTI 
we state the main results of this work - fundamental theorems 
on the existence and characterization of the maxent distribution 
consistent with specified moment inequality constraints. Proofs 
of these theorems and related results of independent interest 
are presented in the appendices. 

II. Background and problem statement 
Notation: R denotes the set of real numbers, 

S := M|J{+oo,-oo} 

the set of extended real numbers, and M d the <i-dimensional 
real Euclidean space. Vectors are denoted by boldface letters, 
for example, x £ ~BL d , and finite dimensional vectors are 
treated as column vectors. All sets in this work are Lebesgue- 
measurable. If A and B are Lebesgue-measurable subsets of 
R d , then the statement A = B means that the set of points not 
simultaneously in both A and B has Lebesgue measure zero 
and A is said to be equal to B almost everywhere (a.e.). All 
functions in this work take values in R and are measurable 
with respect to the Lebesgue measure over R d . Inequalities 
involving measurable functions are to be understood in the a.e. 
sense. All integrals are in the sense of Lebesgue. A probability 
density function (pdf) is a measurable function 7r(x) on W 1 
that is non-negative almost everywhere (a.e.) and integrates 
to unity over R d . All results in this work are stated for 
probability densities over finite-dimensional Euclidean spaces, 
that is, probability distributions that are absolutely continuous 
with respect to the Lebesgue measure, although they can 
be extended to general sigma-finite measures on M. d also. 
£ 1 (R d ) and respectively denote the set of absolutely- 

integrable functions over M. d and the set of essentially bounded 
functions [15, p. 1 19] over M. d . For convenience, we shall often 
omit the 'x' and the 'dx' that appear inside an integral. Thus, 

/ /(x)dx 

J A 

will often be abbreviated to J A /(x) or simply J A f. The 
symbol tt and its variants will denote pdfs and 

E w [(j>] := [ 0-7T 

denotes the mathematical expectation of the function </>(x) 
under the pdf 7r(x). The support of a function /(x) is the set 
of points where it is nonzero 4 and is denoted by supp(/). The 
indicator or characteristic function of a subset A of R d denoted 
by 1a (x) is the function that is equal to one over A and 
zero elsewhere. The volume of a Lebesgue-measurable subset 

4 Note that we are working with probability density functions. 
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S of M. d is its Lebesgue measure and is denoted by \S\. In 
addition to the arithmetic of the extended reals, the following 
conventions regarding infinity are adopted in keeping with 
measure-theoretically consistent operations: 

lnO = -oo, lnf = +00, Va > 0, • (±00) = 0. 

Thus In = which also agrees with the limiting value of 
the quantity tint as the variable t decreases to zero. 

In Bayesian inference, signals of interest are modeled as 
high-dimensional real random vectors with associated pdfs 
referred to as prior distributions on the signals. Let X G R d 
have an underlying d-dimensional pdf denoted by 7r(x). In 
many applications, only limited information about 7r(x) can be 
gathered. Moments of probability distributions are often used 
to describe the underlying statistical structure of a stochastic 
process. For example, the set of all finite-order moments of 
a scalar random variable provides, under suitable regularity 
assumptions, a complete statistical description of the random 
variable [16, Theorem 30.1, p. 388]. In practice, only a 
finite set of moments is a priori known or can be estimated 
(measured) from samples. In many cases even these are not 
available but bounds on the moments are available. The bounds 
may be regarded as arising from the impreciseness of moment 
measurements. For example, for p > 0, the empirical mean 
£ p energies of wavelet coefficients in different subbands are 
often used to construct statistical models for images [17]- 
[19]. In general, the limited information will be unable to 
single out a desirable distribution that is consistent with the 
moment constraints. The limited information would rather 
specify a whole class of distributions that satisfy the moment 
constraints. 

Let prior information about a random vector X be available 
in terms of upper bounds on the expected values of certain 
real-valued Lebesgue-measurable (measurement) functions 

4> n : R d -> E, 7 G r, 

where T is a finite index set 5 . A useful notion is that we can 
sometimes design these functions 4> 1 (x) (that is, the measure- 
ments). Each candidate distribution tt(x) that is consistent with 
these measurements then belongs to the set 

f2(u) := {pdf tt : supp(7r) C S, and for all 7 in V, 

E ff [0 7 ] < u 7 < +00}, (2.1) 

where S is a closed Lebesgue-measurable subset of R d having 
nonzero but possibly infinite volume and 

u := {u 7 G M} 7 er 

is a finite-dimensional, real-valued, vector of moment upper- 
bounds. We assume that the only prior information available 
is expressed by the moment constraints of Q, Since f2 is 
defined through inequality constraints that are linear in tt, 
it is a convex set of probability distributions. It is possible 
to implicitly incorporate support constraints into f2 through 

5 The focus of this work is on the case when the number of measurement 
functions is finite but the results can also be extended to the case when there 
are a countable number of measurement functions. 



appropriate moment inequalities without explicitly requiring 
that supp(7r) C S in the definition. For example, if 

u = ui = 0, 

and 

0o(x) :=-0i(x) :=l-l s (x) 

then for each tt belonging to fl, we have |supp(7r)\5| = 0. 
For clarity of exposition we shall primarily work with the 
convex collection (12. 1> . However, it is quite straightforward 
to extend our results to convex collections having individual 
lowerbounds {l 7 G R} 7 er on the moment measurements. 

In general, many distributions will satisfy the moment 
constraints of fi. The choice of a distribution from this moment 
consistent class depends upon the goals to be achieved by 
the selection. For the application of lossless compression, a 
clear answer can be given. The unique pdf that maximizes the 
differential entropy functional 

h(ir) := -E^Iiitf] 

over a convex set F, whenever it exists, also minimizes 
the worst-case rate for encoding repeated independent ob- 
servations of X "losslessly" [20, pp. 105-106], [7, pp. 61- 
63], [1, Theorem 3, p. 16] (The results in [1], [7] are for 
discrete entropy). A similar result holds for high-rate lossy 
compression [6]. 

Definition 2.1: (Maximum entropy distribution) Let F be a 
convex collection of distributions for which 

F n {pdf tt : h(n) > -00} 

is nonempty. The maxent distribution in F whenever it exists 
is the unique pdf ttme belonging to F satisfying 6 

h(iTME) = maxft,(7r). 
It may be noted that since h(n) is a concave functional [21], 
the set 

{pdf tt : h(jr) > —00} 

is convex. The uniqueness of ttme follows from the strict 
concavity of the differential-entropy functional [21] and the 
convexity of F. 

In addition to being minimax optimal for the application of 
lossless compression with uncertain source statistics discussed 
above, the maxent distribution is also "maximally noncom- 
mittal" with respect to missing information while satisfying 
prior constraints [4]. Shore and Johnson in [2] show that if a 
distribution has to be picked from a class of probability dis- 
tributions by maximizing a functional satisfying some natural 
postulates, it must necessarily be the maxent functional. Again, 
in a study of logically consistent methods of inference, Csiszar 
demonstrates that the maxent distribution is the only one that 
satisfies two different intuitively appealing axiom systems [5]. 
These properties of the maxent distribution make it a desirable 
choice for signal estimation. 

In some applications, based on previous measurements, a 
reliable reference distribution r(x) for the signal of interest 
is available. New moment measurements might reveal that the 

6 The subscript ME stands for maximum entropy. 
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reference distribution has inconsistencies with new informa- 
tion in the form of bounds on moments ( 12. 11 1. The situation 
suggests a revision of the reference model while not ignoring 
earlier measurements. An attractive model selection criterion 
in this situation is to select the distribution in SI that is closest 
to the reference distribution in the sense that it has minimum 
cross-entropy (MCE) relative to the reference prior: 

Definition 2.2: (Cross-entropy [9, p. 146]) The cross- 
entropy of pdf 7Ti(x) with respect to pdf ^(x) (also known as 
the I-divergence, Kullback-Leibler distance, relative entropy, 
and information discrimination) denoted by D(wi\\iT2) is de- 
fined as: 



-D(ttl|K2) := 



is a finite normalization constant. The parameters {A 7 (u)} 76 r 
are all nonnegative, and satisfy 



E W1 [ln(2i)] if 7Ti < tt 2 (see Definition [Q 
+oo otherwise. 
Definition 2.3: (I-projection [9, p. 147]) Let r be a pdf and 
F a convex collection of priors such that 

F n {pdf 7r : D(ir\\r) < +00} 

is nonempty. The I-projection of r onto F, whenever it exists, 
is the unique pdf ttmce belonging to F satisfying 

D(jtmce\\t) = minD(7r||r). 

7TE-F 

The updated distribution ttmce is referred to as the 7- 
projection of r onto F. Since Z?(7r||r) is strictly convex in 
7r [21], and F is a convex set, ttmce is unique whenever it 
exists. 

Generally speaking, the maxent distribution in O J2. 1 1 need 
not exist. Our goal is to provide a set of sufficient conditions 
on the measurement functions that guarantee the existence 
of the maxent prior. We provide such a set of conditions in 
the following section. We also characterize the form of the 
maxent prior. Similar existence and characterization results 
for I-projection under moment inequality constraints can be 
derived along similar lines but are omitted from the present 
work (see [9], [12], [22], [23]). 

III. Existence and characterization of the 

MAXENT DISTRIBUTION 

The following theorem proved in Appendix B.l provides a 
characterization of the unique maxent distribution in SI subject 
to suitable technical conditions. 

Theorem 3.1: (Characterization of the maxent distribution) 
Let f2(u) be as in J2. It . Let there exist a pdf ttq in il(u) such 
that for all 7 in T, E,r [</> 7 ] < u 7 . If the unique maxent pdf 
time belonging to S7(u) exists and Ii(ttme) is finite, then the 
maxent pdf has the form 

7T A/£ ;(x,u) = 1 Sme (x)- 

•exp I -a(u) - ^2 A 7 (u)0 7 (x) > , (3.1) 
I ^ r J 

where S ME ■= supple) Q S satisfies E^ [l s \ SjUE ] = 
for every it 6 fi(u) for which —00 < h(ir) and 

a(u) = In I / exp < — ^ A 7 (u)0 7 (x) > dx 
\Jsmm [ 7 er J ) 



A 7 (E 7rjl/ 

7er 



lYj] ~ Uy) — 0. 



(3.2) 



Moreover, 



K-JTme) = a(u) + 22 ^7( U ) E 7r M B [^7] 

7er 

= a(u) + ^ m 7 A 7 (u). 

7er 

Remark 3.1: Note that if 7r belongs to Sl(u) and —00 < 
h(ir), then ir <C ttme- If there exists a pdf tt in f2(u) with 
—00 < h(ir) and supp(7r) = S, then the set S\Sme has zero 
volume; that is, Sme almost everywhere coincides with S and 
we may take Sme = S in the above theorem. 

Remark 3.2: The numbers {A 7 } 7e r in Theorem 13 . 1 1 above 
are Lagrange multipliers associated with the moment con- 
straints of il(u) in (12. 1> . The constraint qualification d3.2t 
implies that A 7 = if constraint 7 is inactive, that is, 

E ^ME [07] < U 7- 

Remark 3.3: Since S has nonzero volume and it me is 
unique, if the measurement functions {0 7 } 7e r are linearly 
independent then there is a unique choice for the parameters 
A := {A 7 } 7e r that satisfies the moment constraints of S7(u). 
In this case, the mapping from the vector of moment bounds 
u to the vector of Lagrange multipliers A is a function, that 
is, it is not a one-to-many map. If the measurement functions 
are not linearly independent, the characterization theorem still 
holds, but the Lagrange multipliers need not be unique. 

Remark 3.4: The Lagrange multipliers A(u) are usually 
implicit functions of the moment bounds u. If for some value 
of u a Lagrange multiplier turns out to be zero — that is, 
A 7 (u) = for some 7 € Y (a situation that will arise if the 
associated moment constraint is inactive, that is, E ffMB [</> 7 ] < 
Uy) — then the maxent solution corresponding to any larger 
value of Uy will remain the same (see Appendix B.2 for a 
proof). Thus, the map A(u) from moment bounds to Lagrange 
multipliers is in general not injective. However, see the fol- 
lowing remark. 

Remark 3.5: The mapping from the moment upper-bounds 
u to the Lagrange multipliers A(u) is one-to-one when the 
domain is restricted to the set of those values of u for which 
A 7 (u) > for every 7 in Y, that is, all the constraints are 
active. This fact can be seen by the following argument. Sup- 
pose that {w 7 1 ^} 76 r and {w 7 2 ^} 7< =r both map to the same set 
of strictly positive Lagrange multipliers {A 7 > 0} 7e r- Then 
because all constraints are active, due to (13. 21 . necessarily 



[<M*)] 



,(2) 



for every 7 in Y. 

Theorem 13.11 asserts that whenever the maxent distribution 
in a moment-consistent class exists then, subject to some mild 
technical conditions, it has a natural exponential form given 
by J3. II . The next result proved in Appendix B.3 essentially 
asserts that if a pdf having the exponential form given 
is moment consistent then it must be the maxent distribution 
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for the moment-consistent class. In this sense, the next result 
is a converse to Theorem 13. II 

Theorem 3.2: (Converse to the characterization theorem) 
Let O(u) be as in i2.l\ . Consider a pdf 

7T cxp (x, A) := ls oxp (x) • cxp < -a - ^ A 7 (u)0 7 (x) > , 

I ^ r ) 

where S sxp is a measurable subset of S, and the vector 
of nonnegative but finite-valued parameters {A 7 (u)} 76 r is 
denoted by A. If 

(i) 7i"exp belongs to O(u), 

(ii) [1 s\s cxp ] = for every n G fl(u) for which — oo < 
h(ii), and 

(iii) 

A 7( E ^cxp M - U l) = > 

7er 

then 7r cxp is the unique maxent pdf in f2(u) and 

/l(7r exp ) = a + ^ -M u ) E 7r oxp [07 ] 

7er 

is finite. 

Before entering into sufficient conditions for the existence of 
the maxent distribution, we would like to briefly comment on 
some practical aspects of computing the Lagrange multipliers 
from given moment constraints. The infinite-dimensional con- 
strained entropy maximization problem can be converted to a 
finite-dimensional convex minimization problem by invoking 
Lagrange duality theory [20, pp. 21-24]. This forms the 
basis for developing numerical techniques for computing the 
Lagrange multipliers that characterize the maxent distribution. 
Several algorithms based on iterative gradient-projection or 
moment-matching procedures having different convergence 
properties have been proposed in the literature, for exam- 
ple, Bregman's balancing method, multiplicative algebraic 
reconstruction technique, generalized iterative scaling method, 
Newton's method, interior-point methods, etc. [24]. However, 
these algorithms have been largely applied to problems where 
the underlying space is a finite set and require evaluating 
moments at each step. This task can be nontrivial if the 
underlying space is R d and d is large, as in the case of 
images, because moment computation will involve evaluating 
very high dimensional integrals. One would typically need 
to take recourse to computationally intensive algorithms like 
importance sampling or Markov-chain Monte-Carlo for nu- 
merically evaluating the high-dimensional integrals at each 
step. However, in certain situations it might be possible to take 
advantage of the structure of the specific moment functions 
to develop fast heuristic approximations for the Lagrange 
multipliers [20, Chapter 4], [17]-[19]. 

Theorem 3.3: (Existence of the maxent distribution - finite 
volume support constraint) Let S be a closed, Lebesgue- 
measurable subset of R d having nonzero but finite volume. 
If F is a nonempty, convex, /^-complete collection of pdfs 
over S and —00 < h(na) for at least one pdf ttq belonging to 
F, then 

h(F) := sup/i(7r) G R, 



that is, h(F) is finite, and there exists a unique maxent pdf in 
F. 

Corollary 3.4: Let f2(u) be as in i2.l\ . Let {0 7 } 7 er be 
uniformly bounded from below by L G K and S have nonzero 
but finite volume. If f2 is nonempty and 

C := p|{x G S : </> 7 (x) < w 7 } 
7er 

has nonzero volume then there exists a unique maxent pdf 
ir me in O(u) having the exponential form given by Theo- 
remETlwith h{-K ME ) G R. 

The proof of Theorem 13.31 appears in Appendix C. 1 . The 
proof of Corollary 13.41 appears in Appendix C.2. While the 
finite measure condition is crucial to the proof of Theorem l3.3l 
and Corollary 13.41 the next theorem and corollary show that 
the existence of the maxent distribution is guaranteed by the 
presence of a "stabilizing" constraint function in the definition 
of fl even if the support set's volume is not finite. The proofs 
of these results appear in Appendix C.3 and Appendix C.4 
respectively. We would like to point out that the sufficient 
conditions for existence mentioned in [9] and the corollary 
following Theorem 5.2 in [25] for the cross-entropy problem 
is not available for differential entropy unless attention is 
restricted to distributions supported on a set of finite Lebesgue 
measure due to the lack of a general upper-semicontinuity 
property for differential entropy. It is not immediately clear 
how those results can be extended to distributions having an 
infinite-volume support. 

Definition 3.1: (Stable function) A real-valued measurable 
function /(x) is stable if cxp{— A/(x)} belongs to 
for all A > 0. 

Remark 3.6: If /(x) is stable so is A/(x) for all A G 
(0,+oc). 

Theorem 3.5: (Existence of the maxent distribution - sta- 
bilizing constraint) Let S be a closed, Lebesgue-measurable 
subset of R d having nonzero but possibly infinite volume and 
F be a nonempty, convex, ^-complete collection of pdfs over 
S. If 

(i) there exists a ttq in F such that —00 < hfao) and 

(ii) there exist finite reals L, u, with L < u, and a stable 
function ip such that for all ir in F, L < [ip] < u, 

then h(F) := (sup^g^^ h(n)) G R, that is, h(F) is finite, and 
there exists a unique maxent pdf in F. 

Corollary 3.6: Let f2(u) be as in \2.\\ . Let {0 7 } 7 er be 
uniformly bounded from below by L G R and S have nonzero 
(but possibly infinite) volume. If f2 is nonempty and 

1) C := n 7e r{x G S : 7 (x) < tt 7 } has nonzero volume, 

2) there exists 70 G T for which u lQ £ M. and </> 7o is stable, 

then there exists a unique maxent pdf ttme G 0(u) having the 
exponential form given by Theorem l3.ll with h{ytME) G R. 

Remark 3. 7: In Corollaries 13.41 and 13.61 the condition that 
the measurement functions {</> 7 } 7e r be uniformly bounded 
from below by L e R is sufficient to ensure that 0(u) is 
complete under the £ 1 (R d ) norm (see Proposition IC.ll in 
Appendix P. The condition that 

C := n 7er {x G S : 7 (x) < u 7 } 
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has nonzero volume is a sufficient condition to ensure that 
there is at least one pdf ttq with — oo < h(iro). 

In conclusion, we demonstrate a rich class of "well- 
behaved" constraint functions for which condition (2) in 
Corollary 13. 6l is satisfied. The main result here is Theorem l3.7l 
whose proof appears in Appendix iDl 

Definition 3.2: (Omni-directional unboundedness) A real- 
valued function on a vector space is asymptotically positive 
and unbounded in all directions if /(z) — > +oo whenever 
||z|| — > oo. For simplicity we shall refer to this as the omni- 
directional unboundedness property (which is also sometimes 
referred to as the coercive property [26, Definition A.4(c), 
p. 653]). 

Remark 3.8: In a finite-dimensional Banach space such as 
R d , all norms are equivalent [27, Theorem 23.6, p. 177]. In 
other words, if || • || a and || • ||b are two norms, there are 
positive constants L > and U > such that 

£|N| a < ||x|| b < c/||x|| a 

for all x in the finite-dimensional Banach space. Thus in R d , 

||x|| a — >00 <=^ ||x|| b — >00. 

The definition of omnidirectional unboundedness therefore 
does not depend upon the specific norm used when the 
underlying space is finite dimensional. 

Definition 3.3: (Well-behaved function) Let : R d — ► R 
be a convex and omni-directionally unbounded function. A 
real-valued function ip : M. d — ► R is well-behaved if there 
exists a nonnegative real number M such that 

0(x) < V(x), Vx G R d : ||x|| > M and 

SUP|| X ||< M l^(x)| < +oo. 

Remark 3.9: A convex and omni-directionally unbounded 
function is well-behaved. If /(x) is well-behaved so is A/(x) 
for all A belonging to the open interval (0, +oo). 

Theorem 3.7: A well-behaved function is stable. If 7o is 
well-behaved and 

Ett^o] < u 7o < +oo 

then h(ir) exists and 

h(n) < m 70 +ln||e _0 ™|| £ i < oo. 

Hence, if </> 7o belongs to {0 7 } 7 er in Corollary 13.61 then 

sup h(n) < u J0 + In ||e~^™ || £ i < +oo. 
7ref2(u) 

Remark 3.10: Suppose that in Corollary 13.61 none of the 
measurement functions {</> 7 } 7g r is well-behaved, but some 
nonnegative linear combination of the measurement functions 

M := where < /i 7 < +oo for all 7 G T, 

is well-behaved. Let := X^er l^-yU-y an d 

:= {pdf 7r :< [0 m ] < u^}. 

It is clear that f2(u) C f2 M . Hence, the well-behaved function 
cf)^ and the associated moment constraint 

-OO < L < E^r [(j)^} < 



can be included in the set of available moment measurements 
without affecting the maxent solution. Although this new 
constraint is redundant, it tells us that Theorem 13.71 can be 
applied and the maxent distribution in SI exists under the mild 
requirements of Corollary 13.61 
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Appendix A 
Preliminaries 

Definition A.l: (Convex set) A subset C of a vector space 
is said to be convex if whenever zi and Z2 are in C, so is 
azi + (1 — a)z2 for every a in the closed interval [0, 1]. 

Definition A.l: (Convex function) Let V be a vector space. 
A functional / : V — ► R is said to be convex if for every 
a G [0, 1], and for any zi and Z2 belonging to V, 

/(azi + (1 - q)z 2 ) < a/(zi) + (1 - a)/(z 2 ). 

If equality holds only when zi = z 2 then / is said to be 
strictly convex. If — / is (strictly) convex then / is said to be 
(strictly) concave. 

Definition A. 3: (Absolute continuity) A pdf -k\ is said to be 
absolutely continuous relative to a pdf 7T2, in symbols -k\ <C 7T2 
or 7T2 S> 7Ti, if for every Lebesgue measurable subset A of 
R d , J A 7r 2 = implies J A ^i =0 and hence supp^) C 
supp(7r 2 ). 

Fact A.l: [28, p. 5] The cross-entropy of pdf tti relative 
to pdf 7T2 is always well defined and non-negative (it could 
be +00). The cross-entropy is zero if and only if 7Ti = 112 
almost everywhere. 

Fact A.l: [21] Differential entropy /i(tt) is strictly concave 
in 7T. Cross-entropy £>(7ri||7T2) is convex in the pair (7Ti,7T2) 
and strictly convex in 

Fact A.3: (Joint lower semi-continuity of cross-entropy 
[29, Section 2.4, Assertion 5]) If the pdfs p n and q n converge 
in £ 1 (R d ) norm to pdfs p and q respectively as n — > 00, 
then 

D(p\\q) < limMD(p n \\q n ). (A.l) 

n— >oo 

FactA.4: [20, p.88], [11], [1, Theorem 1, p. 14]: If F C 
£ 1 (R d ) is a complete, convex collection of pdfs and h(F) := 
sup,,.^ h(ir) is finite, then there exists a unique distribution 
7r* belonging to F such that for every sequence {7r„} C F for 
which h{-K n ) — > h(F), we have 7r„ — > n* in £ 1 (R d ) norm. 

Fact A.5: (A fundamental theorem of convex optimization 
[30, adapted from Theorem 1, p. 217]) Let V be a vector space 
and F a convex subset of V. Let / : F — ► R be a convex 
functional on F and {.97)7 er a finite collection of convex 
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mappings from F into ML Suppose that there exists a point vo 
in F such that for all 7 £ T, <j 7 (vq) < and 

m :=inf/(v) (A.2) 

is finite where 

G:={veF: g 7 (v) < 0, V7 e T}. 

Then there exist nonnegative Lagrange multipliers {A 7 } 7e r 
such that 



7er 



(A3) 



Furthermore, if the infimum is achieved in iA.H by v* 
belonging to G, it is also achieved by v* in (IA.3> and 



^A 7ff7 (v*) = 0. 

7er 



(A.4) 



Appendix B 

Characterization of the maxent distribution 

1 ) Proof of Theorem \3.1\ We shall apply Fact IA.5I with 

V = C 1 (R d ), 

F = {pdf 7T : supp(7r) C S}, 

/(tt) := —h(Tr), vo := ttq, and 

,g 7 (7r) := E^^] - w 7 

for each 7 in V. Clearly, V is a vector space and F is a convex 
subset of V. Since h(n) is a concave functional, f(ir) is a 
convex functional on F. Also, {ff 7 (7r)} 7 er is a finite collection 
of linear (hence convex) functionals on F. By assumption, ttq 
belongs to F and for each 7 in T, .g 7 (7To) < 0. Therefore F 
is nonempty. The infimum mo in Fact IA.5l is attained at v* = 
ir me and is equal to Ii(ttme) which is finite, that is, mo = 
h(~KME) £ K. Hence tt a/£ • In ttm e is absolutely integrable on 
M. d . We have now verified that the conditions of Fact IA.5I are 
fulfilled and as a consequence, we are guaranteed the existence 
of nonnegative reals {A 7 } 7e r so that (IA.3I and (IA.4I hold, that 
is, 



Httme) = min 



■ A 7 [E w [<?i 7 ] 
7er 



J] ^7 

7er 



3 lvS-] — "7] — 0- 



(B.l) 



(B.2) 



The last condition is equation (13. 2> in Theorem 13. II Consider 
perturbations around the minimizer 7Tj\/,e of the form 



Kg := 7Tm_e • (1 



where 6» € [0, 1], 



g G £°°(R d ), ||g||oo < 1, and E WME [g] = 0. 



(B.3) 



It can be verified that 7Tg > 0, 1 1 tt^ 1 1 ^ 1 = 1, and supple) C S, 
that is, TTg is a pdf with support contained in S for every 
9 £ [0,1]. This ensures that the ^-perturbations of ttme along 



q lie inside F. In view of dB.21 . for all 7 S T for which 
A 7 > 0, we must have 

which implies that 

E *ME 1^7 I < °°' 

Hence, 

E^|^ 7 |<(l + ^)-E WME |^ 7 |<oo (B.4) 
since |1 + 6q\ < 1 + 6*. Furthermore, 

O<(l-0)<l + 0-g<l + 
implies that for all x e R d and for all 9 e [0, 1), 

1 

< 00. 



ln(l+0-g(x))| < A 9 := In max (1 



It follows that 



(1-6) 



\h(n e )\ < (l + e)-E nME \lmr M E\ + 

+ A g < 00, V6> e [0,1). (B.5) 

This shows that irg ■ In irg is also absolutely integrable for all 
9 in [0, 1). In view of (GTT), fB~4t. ( 1531 . and the fact that T 
is a finite index set, 

-OO < -h(-K M E) + J! ^7 t E "«E [<Pl\ ~ u l\ < 

7er 

< —h(-Kg) + A 7 [E^g [</> 7 ] — li 7 ] < +OO. 

7er 



Collecting terms together and using dB.21 we arrive at: 

< 9 ■ E nME [q ■ A 7<^] + 
7er 

+ / (-Kg An -Kg ~ ttme Ann me) < + 00. 
Js 

Thus for all 9 G (0, 1) we have 

< E nME [q-J2^l\ + 
7er 

f (7Tg In TTg — TTME In 7T M£ ;) 

The function 



< + 00. (B.6) 



(tt 8 An TTg - ttme ■ hi it me) 



is integrable for each 9 in (0, 1) and is nondecreasing in 9 for 
each x in S. Furthermore, 

t 0+ := liniTe = q ■ tt M e ■ (1 + ^tt M e) 

is also integrable. The monotone convergence theorem [15, 
p. 87] applied to (rg — ro+) shows that 



To+ = lim / rg. 

s »i° Js 



7 



From dB.6b one therefore obtains: 



< / g ■ time ■ (1 + lri7TM-E + ^ A 7 7 ) < +00 
= / q ■ kme ■ (lri if me + ^ A 7 7 ) < +oo, (B.7) 



[9] 



7er 

from EIJ. But if (El} holds for q 



satisfying (IB.3> . it also holds for —5. We are led to the 
conclusion that for every q belonging to £ 00 (M d ) satisfying 
IMloo < L whenever L q • time = we must also have 

/ 9 • tta/b • (In ttme + y~] A 7 </> 7 ) = 0. 

Thus, for all q belonging to C°°(M. d ), whenever J s q-iiME = 
we must also have 

/ q ■ tt M e ■ (In ttme + ^ A 7 ^ 7 ) = 0. 
Js 7gr 

Let S M e ■= supp(ir M E)- Now, 1 Sme -ttme, ^Sme'^me ■ 
\n-K ME , and {1 Sme ■ ttme ■ <M 7 er all belong to £ 1 (R d ) 
whose norm-dual [30, p. 106] is C°°(R d ). If 

lSj fE ' ^MS ' (lnTTAffi + ^ A 7 7 ) 
7 er 

does not belong to the one-dimensional closed subspace 
spanned by ls ME ' 11 me, then by the Hahn-Banach theorem 
[30, p. 133], there exists a bounded linear functional q on 
£ 1 (M d ) which vanishes at 1s M e ' n ME but not at 

ls JfE ' T^ME ■ (lH7TjVfB + ^ A 7 (/) 7 ), 

7 er 

that is, there exists a q in £°°(R d ) such that J s q ■ time = 
but 

/ q ■ k M e ■ (In time 
"'■s 7S r 

contradicting the conclusion of the last paragraph. Hence there 
exists a real scalar a such that 

TTME ■ (ln7TMS + ^2 A 7 7 ) = -a%ME 

for all x in Sme, that is, 

tt me (x) = 1 Sme (x) • cxp{-a - ^2 A 7 7 (x)}. 

We shall presently show that for each tt belongs to Q with 

—00 < H(tt), we have D(tt\\ttme) < +00, that is, tt, tt <C 
time- In particular, this would mean that 

for all tt € O with — 00 < h(Tr). To show this, define 

n k := ^1 - TTME + ^7T, fc = 1, 2, . . . 

and note that for each fc, (i) 7Tj, belongs to Q, (ii) 7r <C TT k and 
ttm_e "C 7Tfc, and (iii) n k — > ttme, where the convergence 



is in the almost everywhere sense and also under the £ 1 (M d ) 
norm. We have 

+OO > h{lXME) > h(%k) = 

- -rl Httme) + jHtt) + 



1- ij D(TTME\\TTk) + ^D(TT\\TT k ) 



> 



- - 1 K-kme) + jh{n) + yL>(7r||7r fe ), 



where the first inequality follows from the existence of ttme 
and because tt belongs to O, the second equality is an identity, 
and the third inequality follows from the nonnegativity of 
cross-entropy (Fact I A. it . Hence, 

h(TT) + D(TT\\TT k ) < h{TT ME ) 

for all fc. Taking limits, noting that ttu converges to ttme in 
norm, and using the lower semi-continuity property of cross- 
entropy ("Fact lA.31 one obtains 

h(ir) + D(tt\\tt M e) < Httme) < 00. 

Since h(ir) > —00, D(tt\\ttme) < 00. The characterization is 
now complete. ■ 
2) Proof of Remark \3.4\ Let u map to A(u) and ttme be 
the maxent pdf in 0(u). Define 

r := {7 £ r : A 7 = O}. 

Suppose that for all 7 in To, u 7 > u 7 and for all 7 in r\To, 
v! 1 = u 7 . Let tt' me be the maxent pdf in fi(u'). We shall 
show that ttme = t^'me- Clearly, fi(u) C f2(u') implies that 
/i(tjKj) < h(TT ME ). On the other hand, using JB. It with 
tt = tt' me it follows that 

Httme) > h(Tr' ME ) - ^ A 7 [E w ^ e [0 7 ] - u 7 ] 

7er\r 

> H^'me) 



since A 7 > and 



for all 7 in r\To. Thus h(jTME) = H^'me)- Since tt' me is 
unique, the result follows. 

3) Proof of Theorem \3.2\ Let a be the normalization 
constant for which 7r cxp is a valid pdf. The condition 

E 7r [l S \ Sexp (x)]=0 

for all tt belonging to f2(u) for which —00 < h(w) implies 
that tt -c 7r cxp , in particular, supp(-7r) C supp(7r CX p)- Hence, 

< -D(7r||7r OX p) = a + ^A 7 (u)E 7r [^ 7 (x)]-/i(7r) < 00. 
7er 

This implies that 

—00 < Ji(tt) < a + ^ A 7 (u)E T [0 7 ] 
7er 

< a + ^2 A 7 u 7 < 00. 
7er 
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Since 7r oxp belongs to fi(u) and 

A 7( E "ox P [^] -u 7 ) =0, 

7€r 

hence E Tox [</> 7 ] = u 7 for all 7 : A 7 > 0. Thus, for all tt 
belonging to Q(u) for which —00 < h(ir), 

-00 < h(n) < a + A 7 (u)E 7Fcxp [<ft 7 ] = ft.(7r cxp ), 
that is, 

—00 < h(ir) < h(ir cxp ) < 00. 

Hence, for all tt in il(u), ft,(7r) < /i(7r cxp ) < 00 and 7r eX p 
belongs to fi(u). ■ 

Appendix C 
Proof of existence theorems 

1) Proof of Theorem \3 .3\ Let7rs(x) := ^ls(x) (note that 
jS*! < 00). For all tt in F we have 



that is, 



< D(tt\\tt s ) = ln|5| - h{%), 



h(ir) < \n\S\ < 00. 



Since there exists a pdf ttq in F for which —00 < h(iro), it 
follows that 

h(F) := sup h(ir) G R, 



that is, h(F) is finite. Let {^k}kLi be any sequence of pdfs in 
F such that for each k, h(irk) G K and h(iTk) — ► h(F) as k 
goes to 00. Since F is /^-complete, from Fact lA.4l it follows 
that there exists a unique pdf tt* in F to which tt^ converges in 
norm. Convergence in norm implies convergence in measure 
which in turn implies the existence of a subsequence which 
converges almost everywhere [15, Proposition 18, p. 95]. By 
passing to the subsequence we can assume that, without loss 
of generality, tt^ converges to tt* almost everywhere (and in 
norm). The lower semi-continuity property of cross-entropy 
(see JA. II in Fact lA.3t implies that 



hi\S\-h(ir*) = D(%*\\7r s ) < liminf D(n k \\7r s ) 

k — >oo 

= In 1 5 1 — lim h(iTk) 

= In |5| - h(F). (C.l) 

This shows that h(F) < h(w*). However, h{n*) < h(F) 
because 7r* belongs to F. It follows that h(n*) = h(F) and 
hence n me = is the unique maxent pdf in F. ■ 
Proposition C.l: (Variational completeness of f2) Let f2(u) 
be as in (12. 1> . Let {^ 7 } 7 er be uniformly bounded from below 
by L £ R. Then ft is a convex collection of pdfs which is 
complete under the £ 1 (R d ) norm. 

Proof: f2 is convex because E 7r [0 7 ] is linear in tt. 
Let {7r„}^_ 1 be a Cauchy sequence in il C £ 1 (R d ). Since 
£ 1 (R d ) is complete with respect to the || • ||£i(R<i) norm [15, 
Theorem 6, p. 125] and {ir„}^ =1 is a Cauchy sequence, there 
exists tt € £ 1 (R d ) such that 7r„ converges to tt under the 
£ 1 (R d ) norm. We need to show that: (i) tt > (ii) f Rd tt = 1, 



in £ 1 (R d )-norm implies convergence in (Lebesgue) measure 
which in turn implies the existence of a subsequence 7r Ilfc 
converging to tt almost everywhere in R d [15, Proposition 18, 
p. 95]. Since each element of the subsequence satisfies (i), so 
does the limit tt. Furthermore, 



/ 



(tt-1) 



< 



^nWcH 







as n — > 00 so (ii) holds. Applying Fatou's lemma [15, 
Theorem 9, p. 86] to the sequence of non-negative functions 



which converges to 



7T n (x) [</> 7 (x) - L] 



tt(x) [0 7 (x) - L] 



gives 



tt6^ < liminf 



t„0 7 < m 7 . 



Hence (iii) also holds, and tt belongs to O. ■ 

2) Proof of Corollary \3.4\ fl is nonempty by assumption, 
convex by definition, and /^-complete by Proposition lC.il S 
has finite volume by assumption. Since C has nonzero volume 
and CCS which has finite volume, \C\ < 00. If 

i"c(x) := |q 1 c(x) 

denotes the distribution that is uniform over the set C, it is 
clear that ttq belongs to 0(u) and 

h(TT C ) = In \C\ > -00. 

Hence by Theorem 13.31 

h(Ct) 

that is, h(fl) is finite, in fact 

-00 < In |C| < h{Q) < In |5| < 00, 

and there exists a unique maxent pdf ttme belonging to f2(u) 
having the exponential form given by Theorem 13. II ■ 

3) Proof of Theorem \3.5\ For each A > 0, let 



sup h(ir) G 

7rGf2 



and 



Z x :=\\eM-m\y 

(Z A )- 1 cxp{-A ? M 



< +00 



7TA 



and (iii) for all 7 in T, [ 



) 7 ] < 7i 7 



Recall that convergence 



For all tt in F we have 

< £>(7r||7r A ) = AE,,- [tp] + In Z\ - h(w), 

that is, 

h(n) < AE^I-0] + lnZ A < Xu + \nZ x < 00. 

Since there exists a pdf ttq in F for which —00 < h(TTo), it 
follows that 

h(F) := sup h(n) G R, 
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that is, h(F) is finite. Since F is /^-complete, following 
the proof of Theorem 13.31 there exists a unique pdf n* in 
F and a sequence tt^ in F such that for each k, h(nk) G R, 
/i(vrfc) — > M-f 1 ) as g° es to 00 an d i"* — ► tt* both in 
norm and in the almost everywhere sense. The lower semi- 
continuity property of cross-entropy (see (IA.1> in Fact IA.3I 
and the moment constraints 

{Vtt e F, -00 < L < E n [tp] < u < 00} 



imply that 

AE ff .[^]+lnZ A -ft(7r*) 



= r>(7r*||7r A ) 

< liminf £>(7rfc||7TA) 

— »oo 

= lim inf [AEjrj. [tp] + In 

k — >oc 



< 



- h(7T k )} 

Xu + In Z\ 
Xu + In 



lim fifa) 

k — >oo 

h(F). 



This shows that 

h{F) < h(n*) +X[u- E w « < /i(tt*) + A (u - L) 

for all A > 0. Hence for every e > by choosing A such that 
A (u - L) < e, we obtain h(F) < h(n*) + e. Thus h(F) < 
h(n*). However, h(w*) < h(F) because tt* belongs to F. 
It follows that h(-n*) = h(F) and hence tt M e = tt* is the 
unique maxent pdf in F, ■ 
4) Proof of Corollary \3.6\ Q, is nonempty by assumption, 
convex by definition, and /^-complete by Proposition lC.il Let 
C be a subset of C having nonzero but finite volume and 

ttc(x) := — -lc(x). 

It is clear that ttc' belongs to 0(u) and 

h(irc) =ln|C"| > -00. 

Since 7o is uniformly bounded from below by L 6 R, for all 
7r in 17 we have 

— OO < L < E,r[</> 70 ]. 

Again, for all tt in fl, 

Ett^q] < U 7o < OO. 



Hence by Theorem 13.31 

h(n) := sup h(Tr) e M, 

•jren 

that is, ft(f2) is finite, in fact 

-00 < In IC'I < h(n) < inf [uA + lnZ A l < 00 

A>0 

where Z> is as in the proof of Theorem 13.51 and there exists 
a unique maxent pdf ttme belonging to f2(u) having the 
exponential form given by Theorem 13. II ■ 



Appendix D 
Proof of Theorem 13. 71 

Proposition D.l: If pdf tt belongs to £°°(R d ) then h(ir) 
exists and 

-00 < -ln||7r||£oo < h(n). 
If pdf tt belongs to £ 2 (R d ) then h(jr) exists and 



-00 < 1 — IIttI 



Proof: Since ||7r|| £ i = 1 and tt belongs to £°°(R d ), it 
follows that < ||7r||£«>. Also, 

0<7T(X) < ||7T|| £ = 

almost everywhere. Thus, 

—00 < — tt In I \tt\ I £00 < — 7rln7r. 

Since for all nonnegative t, hit < t — 1, we have 

7r(x) - (7r(x)) 2 < — 7r(x) ln7r(x) 

almost everywhere. Since tt belongs to £ 2 (R d ) and 7r is a pdf, 
the result follows. ■ 
Remark D.l : The conditions in the above proposition are 
not necessary for h(ir) to exist and be strictly greater than 
—00. For example, if 

*"0) : = 1 (o»i](*)^' 

then ft,(7r) = ba(f), where l(o,i] (*) is the characteristic func- 
tion of the interval (0, 1]. The conditions in proposition ID. II 
do not guarantee that h(7r) will be finite. For example, 

is both bounded and square integrable which implies that h(rr) 
exists but, h(n) = +00 [31, p. 237]. In the sequel, we shall 
derive a general moment condition that ensures that Ii(tt) when 
it exists, is less that +00 (Corollary ID. 6> . 

Proposition D.l: (Sufficient condition for integrability.) If 
<fi : R d — > R is convex and omnidirectionally unbounded, then 
for all strictly positive a, 

< Z ( f > (a) := / cxp{— at/)(x)} dx < 00. 

In other words, a convex and omni-directionally unbounded 
function is stable. 

Proof: It is clear that for all real-valued a, < Z^(a). 
Since </>(x) is unbounded in all directions, there exists a strictly 
positive r such that for all x satisfying HxH^ > r we have 
0(x) > 0(0). Thus, 

inf </>(x) = inf 0(x) = min </>(x) = 0(xo) 

xeR d l|x||*i<>" l|x||« 1 <r 

for some xo in R d satisfying | |xo | < r. The second equality 
follows because (j> being convex on R d is continuous, and the 
closed ball 

{xeM d : ||x||^ <r} 
is a compact subset of R d . Next, define the function 
:= 0(x + x ) - 0(xo). 
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Since xo is a global minimizer of 0(x), it follows that "0( x ) 
is non-negative, and attains its global minimum value of at 
the origin. The function ip(x) also inherits the convexity and 
omni-directional unboundedness properties of </>(x). Hence 
it suffices to demonstrate that for all strictly positive a, 
cxp{— aip(x)} is integrable. Since ip(x.) is non-negative and 
unbounded in all directions, there exists a p > such that for 
all x satisfying Hxj^ > p we have > 1. Now 



inf V^x) 



■0(x) = ipfe*) > 1 with jjx*!^ = p. 



The first equality follows from the continuity of ?/>(x) and the 
compactness of the closed sphere of radius p in R d . The last 
inequality above follows from the way p has been defined. For 
all x in R d having a norm HxH^ which is strictly larger than 
p, the convexity of 0(x) and the definition of x* imply that 

P x \ / P , fi P 



i < v(x*) < i>{. 



x 



■)< 



x 



For all a > and for all x in 
have 

'IxlUi 



< a { - 



-^(x) + (1 - 
R d such that 



-)V(0). 

Il x lki 

|x||a > p we 



since i/j(0) = 0. Thus, for all x in R d such that 

I Ml*! > p > o, 



exp{- 



•o^(x)} < exp{ ||x||^}. 

P 



Finally, since 



and the exponential function exp{ — |t|}, t € R is integrable 
over R, the result follows. ■ 
The conditions on i/j{xl) in the previous proposition can be 
somewhat relaxed as the following corollary demonstrates. 

Corollary D.3: A well-behaved function is stable, that is, 
if ip : R d — > R is well-behaved, then for all a > 0, 

< Z^a) := [ e^dx < +oo. 

Proof: Since %p is well-behaved, there exists a convex, 
omni-directionally unbounded function <j> : R d — ► R and a 
nonnegative real number M such that for all x in R d whose 
norm is strictly larger than M we have 0(x) < ^(x) and 
for all x in R d whose norm is no larger than M we have 
■0(x) < +oo. Now, 



g-oVi(x) _|_ 



-aip(> 



'{x:||x||<AT} J{x:||x|]>M} 

The first term on the right side is bounded since 0(x) is 
bounded over the set 



< A/} 



{xe 



which has finite measure. Proposition ID. 21 provides an upper 
bound for the second term: 



{x:M<||x||} 



c) dx < 



-co 



and the result follows. ■ 
Proposition DA: Let 7r be a pdf for which there exists a 
convex and omnidirectionally unbounded function <j> : M. d — > R 
such that </>(x) < — hi7r(x) for all | |x| | sufficiently large. Then 
h(n) exists and h(ir) < +oo. If further, n belongs to £°°(R d ) 
or £ 2 (R d ) then h(ir) exists, and \h(ir)\ < +oo, that is, — 7rln7r 
belongs to C 1 ^). 
Proof: Let 

P := {x £ R d : < tt(x) < 1} 

be the set over which — 7rln7r is nonnegative. From the 
assumptions on it there exists a strictly positive real number 
R such that for all I Ixl I > R, 



Define the set 



< </>(x) < -ln7r(x). 



B := {xe R d : ||x|| > i?}. 



Its complement: B c is a closed and bounded subset of M. d and 
has finite volume. Write 



-7rm7r 



-7rhi7r 



PnB c 



-7rln7r. 



(D.l) 



PnB 



We shall show that each integral on the right side of the above 
equality is upper bounded by a positive real number. From this 
it will follow that h(iv) exists and h(iv) < +oo. Since for all 
nonnegative t, \nt < t, for all x in P we have 



< In- 



< 



< -7r(x)ln7r(x) < 1. 

Thus the first integral on the right side of JD. II is upper 
bounded by the volume of P n B c which is less than the 
volume of the bounded set B c . Again, since for all nonnegative 
t, hit < \fi, for all x in P we have 

< -ln?r(x) < 



Ar(x) 



=$> < -7r(x) ln7r(x) < ^(x). 
Now for all x in P n B we have, 



< — tt(x) ln7r(x) < \/7r(x) < e~^~ 
where the last inequality follows from the fact that 

c/)(x) < — Iii7r(x) 
for all x in B. We are lead to the following inequalities 



0< / -7rm7T< / e~^<\\e — s-|| £ i < 
JpnB JpnB 

where the last inequality is a consequence of Corollary ID. 31 
From Proposition ID. II it follows that if further n belongs to 
£°°(R d ) or to £ 2 (R d ), then h{%) > -oo and hence \h(n)\ < 
+oo, that is, — 7rln7r is absolutely integrable. The proof is 
complete. ■ 
Corollary D.5: Let ^(x) be well-behaved. If we define 



-co, 



{x:M<||x||} 



7r(x) 
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then, tt is a pdf, tt belongs to both C°°(R d ) and £ 2 (R d ). 
Thus tt satisfies the conditions and hence the results of 
Proposition ID. 41 that is, h(n) exists and |/i(7r)| < +00. 

Corollary D.6: Let </> 7o be a well-behaved function. If tt is 
any pdf that satisfies 

E w [0 7o ] < u l0 < +00 

then h(ir) exists and 

h(n) < it 70 + In I |e~^™ || £1 < +00 . 

If further —00 < h(n) then < £)(7r||r) < +00 where 

e ~4>-, (x) 



is a pdf. 

Proof: Corollary ID . 3 1 shows that r(x) is integrable and is 
hence a valid pdf. It is also clear that supp(r) = M. d and hence 
tt <C r for each pdf tt. The inequality hit < t — 1 which holds 
for all nonnegative f when applied to i = r(x)/7r(x) reveals 
that for all x belonging to the support-set of the pdf tt, 

— 7r(x) In 7r(x) < r(x) — 7r(x) + 7r(x)0 7o (x) + 
+ 7r(x)ln||e-^o|| £1 . 

Now since E,r[</> 7n ] < u l0 for 7r belonging to VI, integrating 
the above inequality over supp(7r) we can conclude that h(Tr) 
exists and 

h(ir) < u la + In I |e~ ™ U^i < +00. 
It is also clear that if — 00 < h(w) then 
< D(Tr\\r) 

< ln||e"^°|Ui -h(Tr)+ [ tt0 7o 

JR d 

< u^+ln\\e-^o\\ cl -h(7r) 

< +00. 



Theorem l3.7l follows from Corollary ID . 3 1 and Corollary ID. 61 
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